Towards Speaker Detection using Lips Movements for Human-Machine Multiparty Dialogue
نویسندگان
چکیده
This paper explores the use of lips movements for the purpose of speaker and voice activity detection, a task that is essential in multi-modal multiparty human machine dialogue. The task aims at detecting who and when someone is speaking out of a set of persons. A multiparty dialogue consisting of 4 speakers is audiovisually recorded and then annotated for speaker and speech/silence segments. Lips movements are tracked using the real-time FaceAPI face tracking commercial software. The paper reports on results from 3 classification techniques, namely: neural networks, naïve Bayes classifiers, and Mahalanobis distance. In speech/ silence detection, the experiments show promising results using lips movements with an optimal accuracy of 78.31%. The results also show that the neural network classifier has better results than other techniques in speaker dependent and hybrid method. However in speaker independent method the results show that the naïve Bayes classifier has the best result with accuracy of 64.56%.
منابع مشابه
Towards Speaker Detection using FaceAPI Facial Movements in Human-Machine Multiparty Dialogue
In multiparty multimodal dialogue setup, where the robot is set to interact with multiple people, a main requirement for the robot is to recognize the user speaking to it. This would allow the robot to pay attention (visually) to the person the robot is listening to (for example looking by the gaze and head pose to the speaker), and to organize the dialogue structure with multiple people. Knowi...
متن کاملWho’s next? Speaker-selection mechanisms in multiparty dialogue
Participants in conversations have a wide range of verbal and nonverbal expressions at their disposal to signal their intention to occupy the speaker role. This paper addresses two main questions: (1) How do dialogue participants signal their intention to have the next turn, and (2) What aspects of a participant’s behaviour are perceived as signals to determine who should be the next speaker? O...
متن کاملVisual speech detection using OpenCV
Visual information from the human face; lip-movements and tongue provide us with lots of information about the spoken message and helps in understanding the verbal communication. The visual speech detection overcomes some of the persistent problems and inaccuracies encountered by users that creep in when there is background noise. In noisy environment we pay more attention to the lips which dra...
متن کاملAutomatic Summarization of Open-Domain Multiparty Dialogues in Diverse Genres
Automatic summarization of open-domain spoken dialogues is a relatively new research area. This article introduces the task and the challenges involved and motivates and presents an approach for obtaining automatic-extract summaries for human transcripts of multiparty dialogues of four different genres, without any restriction on domain. We address the following issues, which are intrinsic to s...
متن کاملThe furhat social companion talking head
In this demonstrator we present the Furhat robot head. Furhat is a highly human-like robot head in terms of dynamics, thanks to its use of back-projected facial animation. Furhat also takes advantage of a complex and advanced dialogue toolkits designed to facilitate rich and fluent multimodal multiparty human-machine situated and spoken dialogue. The demonstrator will present a social dialogue ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012